本文介绍了用于合成近红外(NIR)图像生成和边界盒水平检测系统的数据集。不可否认的是,诸如Tensorflow或Pytorch之类的高质量机器学习框架以及大规模的Imagenet或可可数据集借助于加速GPU硬件,已将机器学习技术的极限推向了数十多年。在这些突破中,高质量的数据集是可以在模型概括和数据驱动的深神经网络的部署方面取得成功的基本构件之一。特别是,综合数据生成任务通常比其他监督方法需要更多的培训样本。因此,在本文中,我们共享从两个公共数据集(即Nirscene和Sen12ms)和我们的新颖NIR+RGB甜椒(辣椒(辣椒)数据集)重新处理的NIR+RGB数据集。我们定量和定性地证明了这些NIR+RGB数据集足以用于合成NIR图像生成。对于NIRSCENE1,SEN12MS和SEWT PEPPER数据集,我们实现了第11.36、26.53、26.53、26.53和40.15的距离(FID)。此外,我们发布了11个水果边界盒的手动注释,可以使用云服务将其作为各种格式导出。四个新添加的水果[蓝莓,樱桃,猕猴桃和小麦]化合物11新颖的边界盒数据集,在我们先前的DeepFruits项目中提出的作品[Apple,Appsicum,Capsicum,Capsicum,Mango,Orange,Rockmelon,Strawberry]。数据集的边界框实例总数为162K,可以从云服务中使用。为了评估数据集,YOLOV5单阶段检测器被利用并报告了令人印象深刻的平均水平前期,MAP [0.5:0.95]的结果为[min:0.49,最大:0.812]。我们希望这些数据集有用,并作为未来研究的基准。
translated by 谷歌翻译
大型语言模型可以编码有关世界的大量语义知识。这种知识对于旨在采取自然语言表达的高级,时间扩展的指示的机器人可能非常有用。但是,语言模型的一个重大弱点是,它们缺乏现实世界的经验,这使得很难利用它们在给定的体现中进行决策。例如,要求语言模型描述如何清洁溢出物可能会导致合理的叙述,但是它可能不适用于需要在特定环境中执行此任务的特定代理商(例如机器人)。我们建议通过预处理的技能来提供现实世界的基础,这些技能用于限制模型以提出可行且在上下文上适当的自然语言动作。机器人可以充当语​​言模型的“手和眼睛”,而语言模型可以提供有关任务的高级语义知识。我们展示了如何将低级技能与大语言模型结合在一起,以便语言模型提供有关执行复杂和时间扩展说明的过程的高级知识,而与这些技能相关的价值功能则提供了连接必要的基础了解特定的物理环境。我们在许多现实世界的机器人任务上评估了我们的方法,我们表明了对现实世界接地的需求,并且这种方法能够在移动操纵器上完成长远,抽象的自然语言指令。该项目的网站和视频可以在https://say-can.github.io/上找到。
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译
Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.
translated by 谷歌翻译
Machine Reading Comprehension has become one of the most advanced and popular research topics in the fields of Natural Language Processing in recent years. The classification of answerability questions is a relatively significant sub-task in machine reading comprehension; however, there haven't been many studies. Retro-Reader is one of the studies that has solved this problem effectively. However, the encoders of most traditional machine reading comprehension models in general and Retro-Reader, in particular, have not been able to exploit the contextual semantic information of the context completely. Inspired by SemBERT, we use semantic role labels from the SRL task to add semantics to pre-trained language models such as mBERT, XLM-R, PhoBERT. This experiment was conducted to compare the influence of semantics on the classification of answerability for the Vietnamese machine reading comprehension. Additionally, we hope this experiment will enhance the encoder for the Retro-Reader model's Sketchy Reading Module. The improved Retro-Reader model's encoder with semantics was first applied to the Vietnamese Machine Reading Comprehension task and obtained positive results.
translated by 谷歌翻译
Robust Markov decision processes (RMDPs) are promising models that provide reliable policies under ambiguities in model parameters. As opposed to nominal Markov decision processes (MDPs), however, the state-of-the-art solution methods for RMDPs are limited to value-based methods, such as value iteration and policy iteration. This paper proposes Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs with a global convergence guarantee in tabular problems. Unlike value-based methods, DRPG does not rely on dynamic programming techniques. In particular, the inner-loop robust policy evaluation problem is solved via projected gradient descent. Finally, our experimental results demonstrate the performance of our algorithm and verify our theoretical guarantees.
translated by 谷歌翻译
Language models (LMs) have demonstrated remarkable performance on downstream tasks, using in-context exemplars or human instructions. Recent works have shown that chain-of-thought (CoT) prompting can elicit models to solve complex reasoning tasks, step-by-step. However, the efficacy of prompt-based CoT methods is restricted to very large LMs such as GPT-3 (175B), thus limiting deployability. In this paper, we revisit the fine-tuning approach to enable complex reasoning in smaller LMs, optimized to efficiently perform a specific task. We propose Fine-tune-CoT, a method that leverages the capabilities of very large LMs to generate reasoning samples and teach smaller models via fine-tuning. We evaluate our method on publicly available LMs across a wide range of complex tasks and model sizes. We find that Fine-tune-CoT enables substantial reasoning capability in small models, whereas previous prompt-based baselines exhibit near-random performance. Student models can even outperform the teacher in some tasks while reducing model size requirements by several orders of magnitude. We conduct extensive ablations and sample studies to understand the reasoning capabilities of student models. We also identify several important nuances that have been overlooked in concurrent fine-tuning works on CoT and address them in our analysis.
translated by 谷歌翻译
Cartoonization is a task that renders natural photos into cartoon styles. Previous deep cartoonization methods only have focused on end-to-end translation, which may hinder editability. Instead, we propose a novel solution with editing features of texture and color based on the cartoon creation process. To do that, we design a model architecture to have separate decoders, texture and color, to decouple these attributes. In the texture decoder, we propose a texture controller, which enables a user to control stroke style and abstraction to generate diverse cartoon textures. We also introduce an HSV color augmentation to induce the networks to generate diverse and controllable color translation. To the best of our knowledge, our work is the first deep approach to control the cartoonization at inference while showing profound quality improvement over to baselines.
translated by 谷歌翻译
Semi-supervised object detection is important for 3D scene understanding because obtaining large-scale 3D bounding box annotations on point clouds is time-consuming and labor-intensive. Existing semi-supervised methods usually employ teacher-student knowledge distillation together with an augmentation strategy to leverage unlabeled point clouds. However, these methods adopt global augmentation with scene-level transformations and hence are sub-optimal for instance-level object detection. In this work, we propose an object-level point augmentor (OPA) that performs local transformations for semi-supervised 3D object detection. In this way, the resultant augmentor is derived to emphasize object instances rather than irrelevant backgrounds, making the augmented data more useful for object detector training. Extensive experiments on the ScanNet and SUN RGB-D datasets show that the proposed OPA performs favorably against the state-of-the-art methods under various experimental settings. The source code will be available at https://github.com/nomiaro/OPA.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译